library(vcfR) #v1.14.0
library(ggplot2) #v3.4.1
library(adegenet) #v2.1.10
library(SNPfiltR) #v1.0.1
library(StAMPP) #v1.6.3Zosterops RADseq filtering
1 Load R packages
We will follow the SNP filtering protocol outlined by the SNPfiltR R package in (DeRaad 2022).
2 Read in data
We will read in the unfiltered SNPs for all samples output as a vcf file after using BWA (Li and Durbin 2009) to map our raw RADseq data to the Zosterops japonicus reference genome (available at: https://www.ncbi.nlm.nih.gov/assembly/GCA_017612475.1) (Venkatraman, Fleischer, and Tsuchiya 2021).
#read in vcf
vcfR <- read.vcfR("~/Desktop/cali.zosterops.rad/zost.unfiltered.snps.vcf.gz")#check the metadata for the raw, unfiltered SNP dataset
vcfR***** Object of Class vcfR *****
155 samples
3807 CHROMs
236,767 variants
Object size: 1155 Mb
65.49 percent missing data
***** ***** *****
#read in sample info csv
sample.info<-read.csv("~/Desktop/cali.zosterops.rad/zosterops.RAD.sampling.csv")
#make sure sampling file matches the samples in your vcf
sample.info$ID == colnames(vcfR@gt)[-1]Warning in sample.info$ID == colnames(vcfR@gt)[-1]: longer object length is not
a multiple of shorter object length
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[73] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[157] FALSE FALSE FALSE FALSE FALSE
3 Implement quality filters that don’t involve missing data
This is because removing low data samples will alter percentage/quantile based missing data cutoffs, so we wait to implement those until after deciding on our final set of samples for downstream analysis
#hard filter to minimum depth of 5, and minimum genotype quality of 30
vcfR<-hard_filter(vcfR=vcfR, depth = 3, gq = 30)8% of genotypes fall below a read depth of 3 and were converted to NA
1.37% of genotypes fall below a genotype quality of 30 and were converted to NA
Use the function allele_balance() to filter for allele balance from Puritz SNP filtering tutorial “Allele balance: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous, we expect that the allele balance in our data (for real loci) should be close to 0.5”
#execute allele balance filter
vcfR<-filter_allele_balance(vcfR, min.ratio = .10, max.ratio = .90)0.35% of het genotypes (0.01% of all genotypes) fall outside of 0.1 - 0.9 allele balance ratio and were converted to NA
We now want to implement a max depth filter (super high depth loci are likely multiple loci stuck together into a single paralogous locus, which we want to remove).
#visualize and pick appropriate max depth cutoff
max_depth(vcfR)cutoff is not specified, exploratory visualization will be generated.
dashed line indicates a mean depth across all SNPs of 94.1
#filter vcf by the max depth cutoff you chose
vcfR<-max_depth(vcfR, maxdepth = 250)maxdepth cutoff is specified, filtered vcfR object will be returned
14.57% of SNPs were above a mean depth of 250 and were removed from the vcf
Remove SNPs from the vcfR that have become invariant following the removal of questionable genotypes above, and see how many SNPs we have left after this initial set of filters
vcfR<-min_mac(vcfR, min.mac = 1)11.72% of SNPs fell below a minor allele count of 1 and were removed from the VCF
vcfR***** Object of Class vcfR *****
155 samples
3415 CHROMs
178,578 variants
Object size: 713.7 Mb
74.07 percent missing data
***** ***** *****
4 Setting the missing data by sample threshold
Check out the exploratory visualizations and make decisions about which samples to keep for downstream analysis.
#run function to visualize per sample missingness
miss<-missing_by_sample(vcfR)No popmap provided
#run function to visualize per SNP missingness
by.snp<-missing_by_snp(vcfR)cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0655
Warning: Removed 310 rows containing non-finite values
(`stat_density_ridges()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).
Based on these visualizations, we will want to drop the worst sequenced samples that are dragging down the rest of the dataset. Drop those samples based on an approximate missing data proportion cutoff here (this can always be revised if we end up feeling like this is too lenient or stringent later):
#run function to drop samples above the threshold we want from the vcf
vcfR.trim<-missing_by_sample(vcfR=vcfR, cutoff = .9)25 samples are above a 0.9 missing data cutoff, and were removed from VCF
#remove invariant sites generated by sample trimming
vcfR.trim<-min_mac(vcfR.trim, min.mac = 1)1.58% of SNPs fell below a minor allele count of 1 and were removed from the VCF
5 Setting the missing data by SNP threshold
Now we will visualize different per SNP missing data thresholds and identify a value that optimizes the trade-off between amount of missing data and the total number of SNPs retained.
#see what effect trimming samples had on missing data across the dataset
by.snp<-missing_by_snp(vcfR.trim)cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0659
#we want to identify the three samples that don't seem to improve much, even as we crank up the missingness per SNP threshold (i.e., the ones lagging out to the right in this plot)
#to do so, we will get an updated list of missingness per sample across a variety of thresholds saved as a dataframe named 'miss'
miss<-missing_by_sample(vcfR.trim)No popmap provided
#to identify those lagging samples, we will isolate the part of the dataframe corresponing to a 90% completeness per SNP filter, where a handful of samples are clearly lagging
m<-miss[miss$filt == 0.9,]
#print the 10 samples with the lowest number of SNPs retained at a 90% per SNP completeness threshold
m[order(m$snps.retained)[1:10],] indiv filt snps.retained
628 ZpalDOT-5746 0.9 6076
611 Zmon27152 0.9 7376
564 ZJsi003 0.9 7478
521 ZJlo010 0.9 8083
599 ZERxx003 0.9 8122
542 ZJja011 0.9 8200
584 ZMOha001 0.9 8355
594 ZMOmo001 0.9 8564
634 Zsim13809 0.9 8641
572 ZJsi031 0.9 8738
#we can now easily remove remaining problematic samples directly with this simple code:
vcfR.trim<-vcfR.trim[,colnames(vcfR.trim@gt) != "ZpalDOT-5746" & colnames(vcfR.trim@gt) != "Zmon27152" & colnames(vcfR.trim@gt) != "ZJsi003" & colnames(vcfR.trim@gt) != "ZJlo010" & colnames(vcfR.trim@gt) != "ZERxx003" & colnames(vcfR.trim@gt) != "ZJja011"]
#drop any SNPs that became invariant because of sample removal
vcfR.trim<-min_mac(vcfR.trim, min.mac = 1)1.53% of SNPs fell below a minor allele count of 1 and were removed from the VCF
#see what effect trimming samples had on missing data across the dataset
by.snp<-missing_by_snp(vcfR.trim)cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0627
Visualize sample relatedness at 99% per-SNP completeness cutoff to see our null expectations if there is no missing data effect
#set 99% completeness per SNP threshold
test<-missing_by_snp(vcfR.trim, cutoff=.99)cutoff is specified, filtered vcfR object will be returned
99.54% of SNPs fell below a completeness cutoff of 0.99 and were removed from the VCF
missing_by_sample(test) #no samples above 8% missing dataNo popmap provided
indiv filt snps.retained
1 ZJlo012 0.5 789
2 ZJlo015 0.5 789
3 ZJlo016 0.5 789
4 ZJlo017 0.5 789
5 ZJlo021 0.5 789
6 ZJlo022 0.5 789
7 ZJlo024 0.5 789
8 ZJlo031 0.5 789
9 ZJlo045 0.5 782
10 ZJlo046 0.5 789
11 ZJlo050 0.5 788
12 ZJlo052 0.5 789
13 ZJlo055 0.5 789
14 ZJja001 0.5 771
15 ZJja002 0.5 789
16 ZJja003 0.5 789
17 ZJja004 0.5 760
18 ZJja005 0.5 789
19 ZJja009 0.5 789
20 ZJja010 0.5 778
21 ZJja012 0.5 789
22 ZJja013 0.5 789
23 ZJja014 0.5 789
24 ZJja016 0.5 789
25 ZJja030 0.5 789
26 ZJal003 0.5 789
27 Zjal004 0.5 789
28 ZJal005 0.5 789
29 ZJal010 0.5 789
30 ZJal011 0.5 789
31 ZJal012 0.5 789
32 ZJal020 0.5 789
33 ZJst007 0.5 789
34 ZJst009 0.5 789
35 ZJst010 0.5 789
36 ZJst012 0.5 789
37 ZJst014 0.5 789
38 ZJst015 0.5 789
39 ZJin001 0.5 789
40 ZJin003 0.5 789
41 ZJsi002 0.5 789
42 ZJsi017 0.5 781
43 ZJsi032 0.5 747
44 ZJsi023 0.5 789
45 ZJsi024 0.5 768
46 ZJsi027 0.5 758
47 ZJsi028 0.5 787
48 ZJsi029 0.5 789
49 ZJsi031 0.5 737
50 ZPxx001 0.5 789
51 ZPxx002 0.5 789
52 ZMxx002 0.5 789
53 ZMxx003 0.5 789
54 ZMxx004 0.5 789
55 ZMxx005 0.5 766
56 ZMxx006 0.5 789
57 ZMOxx001 0.5 789
58 ZMOxx002 0.5 789
59 ZMOxx003 0.5 782
60 ZMOxx005 0.5 789
61 ZMOha001 0.5 742
62 ZMOwh002 0.5 770
63 ZMOwh003 0.5 789
64 ZMOwh004 0.5 789
65 ZMOwh005 0.5 770
66 ZMOwh006 0.5 789
67 ZMOvu002 0.5 789
68 ZMOvu003 0.5 789
69 ZMOvu004 0.5 787
70 ZMOvu005 0.5 788
71 ZMOmo001 0.5 753
72 ZMOpa001 0.5 788
73 ZMOpa002 0.5 789
74 ZNni001 0.5 783
75 ZERxx002 0.5 789
76 ZERxx004 0.5 789
77 ZERxx005 0.5 789
78 Zsim23588 0.5 789
79 Zsim28142 0.5 789
80 Zsim30897 0.5 787
81 Zpal23498 0.5 782
82 Zpal23522 0.5 789
83 Zmon20893 0.5 789
84 Zmon20892 0.5 781
85 Zmon20902 0.5 789
86 Zmon20909 0.5 789
87 Zsim31166 0.5 784
88 Zsim31159 0.5 789
89 Zmey17876 0.5 789
90 Zmey17877 0.5 789
91 Zmey17852 0.5 788
92 Zmey17922 0.5 789
93 Zmey17920 0.5 789
94 Zmey17925 0.5 789
95 Zmey17923 0.5 789
96 Zery28090 0.5 789
97 Zery28091 0.5 781
98 Zery28088 0.5 731
99 Zni10863 0.5 784
100 Zni14341 0.5 789
101 Zni19650 0.5 788
102 Zni17984 0.5 784
103 ZjaDOT-10981 0.5 789
104 ZjaDOT-5235 0.5 789
105 Z_LA_122866_2 0.5 789
106 Z_LA_122577_2 0.5 789
107 Z_LA_122188_2 0.5 789
108 Zsim13809 0.5 758
109 Zsim13773 0.5 789
110 Zsim6797 0.5 789
111 Zsim10336 0.5 781
112 Zsim11362 0.5 789
113 Zsim11102 0.5 789
114 Zsim11220 0.5 789
115 Zmon20899 0.5 784
116 Zmon28375 0.5 780
117 Zev13949 0.5 789
118 Zev31650 0.5 781
119 Zev28451 0.5 789
120 Z_HI_BRY431 0.5 787
121 Z_HI_NAN290 0.5 789
122 Z_HI_WAI087 0.5 757
123 Z_HI_WAI078 0.5 781
124 Z_HI_SOL783 0.5 789
125 ZJlo012 0.6 789
126 ZJlo015 0.6 789
127 ZJlo016 0.6 789
128 ZJlo017 0.6 789
129 ZJlo021 0.6 789
130 ZJlo022 0.6 789
131 ZJlo024 0.6 789
132 ZJlo031 0.6 789
133 ZJlo045 0.6 782
134 ZJlo046 0.6 789
135 ZJlo050 0.6 788
136 ZJlo052 0.6 789
137 ZJlo055 0.6 789
138 ZJja001 0.6 771
139 ZJja002 0.6 789
140 ZJja003 0.6 789
141 ZJja004 0.6 760
142 ZJja005 0.6 789
143 ZJja009 0.6 789
144 ZJja010 0.6 778
145 ZJja012 0.6 789
146 ZJja013 0.6 789
147 ZJja014 0.6 789
148 ZJja016 0.6 789
149 ZJja030 0.6 789
150 ZJal003 0.6 789
151 Zjal004 0.6 789
152 ZJal005 0.6 789
153 ZJal010 0.6 789
154 ZJal011 0.6 789
155 ZJal012 0.6 789
156 ZJal020 0.6 789
157 ZJst007 0.6 789
158 ZJst009 0.6 789
159 ZJst010 0.6 789
160 ZJst012 0.6 789
161 ZJst014 0.6 789
162 ZJst015 0.6 789
163 ZJin001 0.6 789
164 ZJin003 0.6 789
165 ZJsi002 0.6 789
166 ZJsi017 0.6 781
167 ZJsi032 0.6 747
168 ZJsi023 0.6 789
169 ZJsi024 0.6 768
170 ZJsi027 0.6 758
171 ZJsi028 0.6 787
172 ZJsi029 0.6 789
173 ZJsi031 0.6 737
174 ZPxx001 0.6 789
175 ZPxx002 0.6 789
176 ZMxx002 0.6 789
177 ZMxx003 0.6 789
178 ZMxx004 0.6 789
179 ZMxx005 0.6 766
180 ZMxx006 0.6 789
181 ZMOxx001 0.6 789
182 ZMOxx002 0.6 789
183 ZMOxx003 0.6 782
184 ZMOxx005 0.6 789
185 ZMOha001 0.6 742
186 ZMOwh002 0.6 770
187 ZMOwh003 0.6 789
188 ZMOwh004 0.6 789
189 ZMOwh005 0.6 770
190 ZMOwh006 0.6 789
191 ZMOvu002 0.6 789
192 ZMOvu003 0.6 789
193 ZMOvu004 0.6 787
194 ZMOvu005 0.6 788
195 ZMOmo001 0.6 753
196 ZMOpa001 0.6 788
197 ZMOpa002 0.6 789
198 ZNni001 0.6 783
199 ZERxx002 0.6 789
200 ZERxx004 0.6 789
201 ZERxx005 0.6 789
202 Zsim23588 0.6 789
203 Zsim28142 0.6 789
204 Zsim30897 0.6 787
205 Zpal23498 0.6 782
206 Zpal23522 0.6 789
207 Zmon20893 0.6 789
208 Zmon20892 0.6 781
209 Zmon20902 0.6 789
210 Zmon20909 0.6 789
211 Zsim31166 0.6 784
212 Zsim31159 0.6 789
213 Zmey17876 0.6 789
214 Zmey17877 0.6 789
215 Zmey17852 0.6 788
216 Zmey17922 0.6 789
217 Zmey17920 0.6 789
218 Zmey17925 0.6 789
219 Zmey17923 0.6 789
220 Zery28090 0.6 789
221 Zery28091 0.6 781
222 Zery28088 0.6 731
223 Zni10863 0.6 784
224 Zni14341 0.6 789
225 Zni19650 0.6 788
226 Zni17984 0.6 784
227 ZjaDOT-10981 0.6 789
228 ZjaDOT-5235 0.6 789
229 Z_LA_122866_2 0.6 789
230 Z_LA_122577_2 0.6 789
231 Z_LA_122188_2 0.6 789
232 Zsim13809 0.6 758
233 Zsim13773 0.6 789
234 Zsim6797 0.6 789
235 Zsim10336 0.6 781
236 Zsim11362 0.6 789
237 Zsim11102 0.6 789
238 Zsim11220 0.6 789
239 Zmon20899 0.6 784
240 Zmon28375 0.6 780
241 Zev13949 0.6 789
242 Zev31650 0.6 781
243 Zev28451 0.6 789
244 Z_HI_BRY431 0.6 787
245 Z_HI_NAN290 0.6 789
246 Z_HI_WAI087 0.6 757
247 Z_HI_WAI078 0.6 781
248 Z_HI_SOL783 0.6 789
249 ZJlo012 0.7 789
250 ZJlo015 0.7 789
251 ZJlo016 0.7 789
252 ZJlo017 0.7 789
253 ZJlo021 0.7 789
254 ZJlo022 0.7 789
255 ZJlo024 0.7 789
256 ZJlo031 0.7 789
257 ZJlo045 0.7 782
258 ZJlo046 0.7 789
259 ZJlo050 0.7 788
260 ZJlo052 0.7 789
261 ZJlo055 0.7 789
262 ZJja001 0.7 771
263 ZJja002 0.7 789
264 ZJja003 0.7 789
265 ZJja004 0.7 760
266 ZJja005 0.7 789
267 ZJja009 0.7 789
268 ZJja010 0.7 778
269 ZJja012 0.7 789
270 ZJja013 0.7 789
271 ZJja014 0.7 789
272 ZJja016 0.7 789
273 ZJja030 0.7 789
274 ZJal003 0.7 789
275 Zjal004 0.7 789
276 ZJal005 0.7 789
277 ZJal010 0.7 789
278 ZJal011 0.7 789
279 ZJal012 0.7 789
280 ZJal020 0.7 789
281 ZJst007 0.7 789
282 ZJst009 0.7 789
283 ZJst010 0.7 789
284 ZJst012 0.7 789
285 ZJst014 0.7 789
286 ZJst015 0.7 789
287 ZJin001 0.7 789
288 ZJin003 0.7 789
289 ZJsi002 0.7 789
290 ZJsi017 0.7 781
291 ZJsi032 0.7 747
292 ZJsi023 0.7 789
293 ZJsi024 0.7 768
294 ZJsi027 0.7 758
295 ZJsi028 0.7 787
296 ZJsi029 0.7 789
297 ZJsi031 0.7 737
298 ZPxx001 0.7 789
299 ZPxx002 0.7 789
300 ZMxx002 0.7 789
301 ZMxx003 0.7 789
302 ZMxx004 0.7 789
303 ZMxx005 0.7 766
304 ZMxx006 0.7 789
305 ZMOxx001 0.7 789
306 ZMOxx002 0.7 789
307 ZMOxx003 0.7 782
308 ZMOxx005 0.7 789
309 ZMOha001 0.7 742
310 ZMOwh002 0.7 770
311 ZMOwh003 0.7 789
312 ZMOwh004 0.7 789
313 ZMOwh005 0.7 770
314 ZMOwh006 0.7 789
315 ZMOvu002 0.7 789
316 ZMOvu003 0.7 789
317 ZMOvu004 0.7 787
318 ZMOvu005 0.7 788
319 ZMOmo001 0.7 753
320 ZMOpa001 0.7 788
321 ZMOpa002 0.7 789
322 ZNni001 0.7 783
323 ZERxx002 0.7 789
324 ZERxx004 0.7 789
325 ZERxx005 0.7 789
326 Zsim23588 0.7 789
327 Zsim28142 0.7 789
328 Zsim30897 0.7 787
329 Zpal23498 0.7 782
330 Zpal23522 0.7 789
331 Zmon20893 0.7 789
332 Zmon20892 0.7 781
333 Zmon20902 0.7 789
334 Zmon20909 0.7 789
335 Zsim31166 0.7 784
336 Zsim31159 0.7 789
337 Zmey17876 0.7 789
338 Zmey17877 0.7 789
339 Zmey17852 0.7 788
340 Zmey17922 0.7 789
341 Zmey17920 0.7 789
342 Zmey17925 0.7 789
343 Zmey17923 0.7 789
344 Zery28090 0.7 789
345 Zery28091 0.7 781
346 Zery28088 0.7 731
347 Zni10863 0.7 784
348 Zni14341 0.7 789
349 Zni19650 0.7 788
350 Zni17984 0.7 784
351 ZjaDOT-10981 0.7 789
352 ZjaDOT-5235 0.7 789
353 Z_LA_122866_2 0.7 789
354 Z_LA_122577_2 0.7 789
355 Z_LA_122188_2 0.7 789
356 Zsim13809 0.7 758
357 Zsim13773 0.7 789
358 Zsim6797 0.7 789
359 Zsim10336 0.7 781
360 Zsim11362 0.7 789
361 Zsim11102 0.7 789
362 Zsim11220 0.7 789
363 Zmon20899 0.7 784
364 Zmon28375 0.7 780
365 Zev13949 0.7 789
366 Zev31650 0.7 781
367 Zev28451 0.7 789
368 Z_HI_BRY431 0.7 787
369 Z_HI_NAN290 0.7 789
370 Z_HI_WAI087 0.7 757
371 Z_HI_WAI078 0.7 781
372 Z_HI_SOL783 0.7 789
373 ZJlo012 0.8 789
374 ZJlo015 0.8 789
375 ZJlo016 0.8 789
376 ZJlo017 0.8 789
377 ZJlo021 0.8 789
378 ZJlo022 0.8 789
379 ZJlo024 0.8 789
380 ZJlo031 0.8 789
381 ZJlo045 0.8 782
382 ZJlo046 0.8 789
383 ZJlo050 0.8 788
384 ZJlo052 0.8 789
385 ZJlo055 0.8 789
386 ZJja001 0.8 771
387 ZJja002 0.8 789
388 ZJja003 0.8 789
389 ZJja004 0.8 760
390 ZJja005 0.8 789
391 ZJja009 0.8 789
392 ZJja010 0.8 778
393 ZJja012 0.8 789
394 ZJja013 0.8 789
395 ZJja014 0.8 789
396 ZJja016 0.8 789
397 ZJja030 0.8 789
398 ZJal003 0.8 789
399 Zjal004 0.8 789
400 ZJal005 0.8 789
401 ZJal010 0.8 789
402 ZJal011 0.8 789
403 ZJal012 0.8 789
404 ZJal020 0.8 789
405 ZJst007 0.8 789
406 ZJst009 0.8 789
407 ZJst010 0.8 789
408 ZJst012 0.8 789
409 ZJst014 0.8 789
410 ZJst015 0.8 789
411 ZJin001 0.8 789
412 ZJin003 0.8 789
413 ZJsi002 0.8 789
414 ZJsi017 0.8 781
415 ZJsi032 0.8 747
416 ZJsi023 0.8 789
417 ZJsi024 0.8 768
418 ZJsi027 0.8 758
419 ZJsi028 0.8 787
420 ZJsi029 0.8 789
421 ZJsi031 0.8 737
422 ZPxx001 0.8 789
423 ZPxx002 0.8 789
424 ZMxx002 0.8 789
425 ZMxx003 0.8 789
426 ZMxx004 0.8 789
427 ZMxx005 0.8 766
428 ZMxx006 0.8 789
429 ZMOxx001 0.8 789
430 ZMOxx002 0.8 789
431 ZMOxx003 0.8 782
432 ZMOxx005 0.8 789
433 ZMOha001 0.8 742
434 ZMOwh002 0.8 770
435 ZMOwh003 0.8 789
436 ZMOwh004 0.8 789
437 ZMOwh005 0.8 770
438 ZMOwh006 0.8 789
439 ZMOvu002 0.8 789
440 ZMOvu003 0.8 789
441 ZMOvu004 0.8 787
442 ZMOvu005 0.8 788
443 ZMOmo001 0.8 753
444 ZMOpa001 0.8 788
445 ZMOpa002 0.8 789
446 ZNni001 0.8 783
447 ZERxx002 0.8 789
448 ZERxx004 0.8 789
449 ZERxx005 0.8 789
450 Zsim23588 0.8 789
451 Zsim28142 0.8 789
452 Zsim30897 0.8 787
453 Zpal23498 0.8 782
454 Zpal23522 0.8 789
455 Zmon20893 0.8 789
456 Zmon20892 0.8 781
457 Zmon20902 0.8 789
458 Zmon20909 0.8 789
459 Zsim31166 0.8 784
460 Zsim31159 0.8 789
461 Zmey17876 0.8 789
462 Zmey17877 0.8 789
463 Zmey17852 0.8 788
464 Zmey17922 0.8 789
465 Zmey17920 0.8 789
466 Zmey17925 0.8 789
467 Zmey17923 0.8 789
468 Zery28090 0.8 789
469 Zery28091 0.8 781
470 Zery28088 0.8 731
471 Zni10863 0.8 784
472 Zni14341 0.8 789
473 Zni19650 0.8 788
474 Zni17984 0.8 784
475 ZjaDOT-10981 0.8 789
476 ZjaDOT-5235 0.8 789
477 Z_LA_122866_2 0.8 789
478 Z_LA_122577_2 0.8 789
479 Z_LA_122188_2 0.8 789
480 Zsim13809 0.8 758
481 Zsim13773 0.8 789
482 Zsim6797 0.8 789
483 Zsim10336 0.8 781
484 Zsim11362 0.8 789
485 Zsim11102 0.8 789
486 Zsim11220 0.8 789
487 Zmon20899 0.8 784
488 Zmon28375 0.8 780
489 Zev13949 0.8 789
490 Zev31650 0.8 781
491 Zev28451 0.8 789
492 Z_HI_BRY431 0.8 787
493 Z_HI_NAN290 0.8 789
494 Z_HI_WAI087 0.8 757
495 Z_HI_WAI078 0.8 781
496 Z_HI_SOL783 0.8 789
497 ZJlo012 0.9 789
498 ZJlo015 0.9 789
499 ZJlo016 0.9 789
500 ZJlo017 0.9 789
501 ZJlo021 0.9 789
502 ZJlo022 0.9 789
503 ZJlo024 0.9 789
504 ZJlo031 0.9 789
505 ZJlo045 0.9 782
506 ZJlo046 0.9 789
507 ZJlo050 0.9 788
508 ZJlo052 0.9 789
509 ZJlo055 0.9 789
510 ZJja001 0.9 771
511 ZJja002 0.9 789
512 ZJja003 0.9 789
513 ZJja004 0.9 760
514 ZJja005 0.9 789
515 ZJja009 0.9 789
516 ZJja010 0.9 778
517 ZJja012 0.9 789
518 ZJja013 0.9 789
519 ZJja014 0.9 789
520 ZJja016 0.9 789
521 ZJja030 0.9 789
522 ZJal003 0.9 789
523 Zjal004 0.9 789
524 ZJal005 0.9 789
525 ZJal010 0.9 789
526 ZJal011 0.9 789
527 ZJal012 0.9 789
528 ZJal020 0.9 789
529 ZJst007 0.9 789
530 ZJst009 0.9 789
531 ZJst010 0.9 789
532 ZJst012 0.9 789
533 ZJst014 0.9 789
534 ZJst015 0.9 789
535 ZJin001 0.9 789
536 ZJin003 0.9 789
537 ZJsi002 0.9 789
538 ZJsi017 0.9 781
539 ZJsi032 0.9 747
540 ZJsi023 0.9 789
541 ZJsi024 0.9 768
542 ZJsi027 0.9 758
543 ZJsi028 0.9 787
544 ZJsi029 0.9 789
545 ZJsi031 0.9 737
546 ZPxx001 0.9 789
547 ZPxx002 0.9 789
548 ZMxx002 0.9 789
549 ZMxx003 0.9 789
550 ZMxx004 0.9 789
551 ZMxx005 0.9 766
552 ZMxx006 0.9 789
553 ZMOxx001 0.9 789
554 ZMOxx002 0.9 789
555 ZMOxx003 0.9 782
556 ZMOxx005 0.9 789
557 ZMOha001 0.9 742
558 ZMOwh002 0.9 770
559 ZMOwh003 0.9 789
560 ZMOwh004 0.9 789
561 ZMOwh005 0.9 770
562 ZMOwh006 0.9 789
563 ZMOvu002 0.9 789
564 ZMOvu003 0.9 789
565 ZMOvu004 0.9 787
566 ZMOvu005 0.9 788
567 ZMOmo001 0.9 753
568 ZMOpa001 0.9 788
569 ZMOpa002 0.9 789
570 ZNni001 0.9 783
571 ZERxx002 0.9 789
572 ZERxx004 0.9 789
573 ZERxx005 0.9 789
574 Zsim23588 0.9 789
575 Zsim28142 0.9 789
576 Zsim30897 0.9 787
577 Zpal23498 0.9 782
578 Zpal23522 0.9 789
579 Zmon20893 0.9 789
580 Zmon20892 0.9 781
581 Zmon20902 0.9 789
582 Zmon20909 0.9 789
583 Zsim31166 0.9 784
584 Zsim31159 0.9 789
585 Zmey17876 0.9 789
586 Zmey17877 0.9 789
587 Zmey17852 0.9 788
588 Zmey17922 0.9 789
589 Zmey17920 0.9 789
590 Zmey17925 0.9 789
591 Zmey17923 0.9 789
592 Zery28090 0.9 789
593 Zery28091 0.9 781
594 Zery28088 0.9 731
595 Zni10863 0.9 784
596 Zni14341 0.9 789
597 Zni19650 0.9 788
598 Zni17984 0.9 784
599 ZjaDOT-10981 0.9 789
600 ZjaDOT-5235 0.9 789
601 Z_LA_122866_2 0.9 789
602 Z_LA_122577_2 0.9 789
603 Z_LA_122188_2 0.9 789
604 Zsim13809 0.9 758
605 Zsim13773 0.9 789
606 Zsim6797 0.9 789
607 Zsim10336 0.9 781
608 Zsim11362 0.9 789
609 Zsim11102 0.9 789
610 Zsim11220 0.9 789
611 Zmon20899 0.9 784
612 Zmon28375 0.9 780
613 Zev13949 0.9 789
614 Zev31650 0.9 781
615 Zev28451 0.9 789
616 Z_HI_BRY431 0.9 787
617 Z_HI_NAN290 0.9 789
618 Z_HI_WAI087 0.9 757
619 Z_HI_WAI078 0.9 781
620 Z_HI_SOL783 0.9 789
621 ZJlo012 1.0 203
622 ZJlo015 1.0 203
623 ZJlo016 1.0 203
624 ZJlo017 1.0 203
625 ZJlo021 1.0 203
626 ZJlo022 1.0 203
627 ZJlo024 1.0 203
628 ZJlo031 1.0 203
629 ZJlo045 1.0 203
630 ZJlo046 1.0 203
631 ZJlo050 1.0 203
632 ZJlo052 1.0 203
633 ZJlo055 1.0 203
634 ZJja001 1.0 203
635 ZJja002 1.0 203
636 ZJja003 1.0 203
637 ZJja004 1.0 203
638 ZJja005 1.0 203
639 ZJja009 1.0 203
640 ZJja010 1.0 203
641 ZJja012 1.0 203
642 ZJja013 1.0 203
643 ZJja014 1.0 203
644 ZJja016 1.0 203
645 ZJja030 1.0 203
646 ZJal003 1.0 203
647 Zjal004 1.0 203
648 ZJal005 1.0 203
649 ZJal010 1.0 203
650 ZJal011 1.0 203
651 ZJal012 1.0 203
652 ZJal020 1.0 203
653 ZJst007 1.0 203
654 ZJst009 1.0 203
655 ZJst010 1.0 203
656 ZJst012 1.0 203
657 ZJst014 1.0 203
658 ZJst015 1.0 203
659 ZJin001 1.0 203
660 ZJin003 1.0 203
661 ZJsi002 1.0 203
662 ZJsi017 1.0 203
663 ZJsi032 1.0 203
664 ZJsi023 1.0 203
665 ZJsi024 1.0 203
666 ZJsi027 1.0 203
667 ZJsi028 1.0 203
668 ZJsi029 1.0 203
669 ZJsi031 1.0 203
670 ZPxx001 1.0 203
671 ZPxx002 1.0 203
672 ZMxx002 1.0 203
673 ZMxx003 1.0 203
674 ZMxx004 1.0 203
675 ZMxx005 1.0 203
676 ZMxx006 1.0 203
677 ZMOxx001 1.0 203
678 ZMOxx002 1.0 203
679 ZMOxx003 1.0 203
680 ZMOxx005 1.0 203
681 ZMOha001 1.0 203
682 ZMOwh002 1.0 203
683 ZMOwh003 1.0 203
684 ZMOwh004 1.0 203
685 ZMOwh005 1.0 203
686 ZMOwh006 1.0 203
687 ZMOvu002 1.0 203
688 ZMOvu003 1.0 203
689 ZMOvu004 1.0 203
690 ZMOvu005 1.0 203
691 ZMOmo001 1.0 203
692 ZMOpa001 1.0 203
693 ZMOpa002 1.0 203
694 ZNni001 1.0 203
695 ZERxx002 1.0 203
696 ZERxx004 1.0 203
697 ZERxx005 1.0 203
698 Zsim23588 1.0 203
699 Zsim28142 1.0 203
700 Zsim30897 1.0 203
701 Zpal23498 1.0 203
702 Zpal23522 1.0 203
703 Zmon20893 1.0 203
704 Zmon20892 1.0 203
705 Zmon20902 1.0 203
706 Zmon20909 1.0 203
707 Zsim31166 1.0 203
708 Zsim31159 1.0 203
709 Zmey17876 1.0 203
710 Zmey17877 1.0 203
711 Zmey17852 1.0 203
712 Zmey17922 1.0 203
713 Zmey17920 1.0 203
714 Zmey17925 1.0 203
715 Zmey17923 1.0 203
716 Zery28090 1.0 203
717 Zery28091 1.0 203
718 Zery28088 1.0 203
719 Zni10863 1.0 203
720 Zni14341 1.0 203
721 Zni19650 1.0 203
722 Zni17984 1.0 203
723 ZjaDOT-10981 1.0 203
724 ZjaDOT-5235 1.0 203
725 Z_LA_122866_2 1.0 203
726 Z_LA_122577_2 1.0 203
727 Z_LA_122188_2 1.0 203
728 Zsim13809 1.0 203
729 Zsim13773 1.0 203
730 Zsim6797 1.0 203
731 Zsim10336 1.0 203
732 Zsim11362 1.0 203
733 Zsim11102 1.0 203
734 Zsim11220 1.0 203
735 Zmon20899 1.0 203
736 Zmon28375 1.0 203
737 Zev13949 1.0 203
738 Zev31650 1.0 203
739 Zev28451 1.0 203
740 Z_HI_BRY431 1.0 203
741 Z_HI_NAN290 1.0 203
742 Z_HI_WAI087 1.0 203
743 Z_HI_WAI078 1.0 203
744 Z_HI_SOL783 1.0 203
#convert to genlight
gen<-vcfR2genlight(test)
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/cali.zosterops.rad/test.99.splits.txt")Try setting the threshold at 90% (should retain around 15K high quality SNPs)
vcfR.trimmed<-missing_by_snp(vcfR=vcfR.trim, cutoff = .9)cutoff is specified, filtered vcfR object will be returned
90.92% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF
miss<-missing_by_sample(vcfR.trimmed)No popmap provided
vcfR.trimmed***** Object of Class vcfR *****
124 samples
239 CHROMs
15,722 variants
Object size: 166.6 Mb
5.316 percent missing data
***** ***** *****
#convert to genlight
gen<-vcfR2genlight(vcfR.trimmed)
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/cali.zosterops.rad/filtered.90.splits.txt")6 Remove overlapping SNPs
It is a known thing (see this) that Stacks will not merge SNPs called twice if they are sequenced by separate (but physically overlapping) loci. To account for this, we will simply remove a SNP every time its chromosome and position have already been seen in the dataset with the following code:
#generate dataframe containing information for chromosome and bp locality of each SNP
df<-as.data.frame(vcfR.trimmed@fix[,1:2])
#calc number of duplicated SNPs to remove
nrow(df) - length(unique(paste(df$CHROM,df$POS)))[1] 18
#remove duplicated SNPs
vcfR.trimmed<-vcfR.trimmed[!duplicated(paste(df$CHROM,df$POS)),]7 Visualize depth and quality across all retained genotypes
#plot depth per snp and per sample
dp <- extract.gt(vcfR.trimmed, element = "DP", as.numeric=TRUE)
heatmap.bp(dp, rlabels = FALSE)#plot genotype quality per snp and per sample
gq <- extract.gt(vcfR.trimmed, element = "GQ", as.numeric=TRUE)
heatmap.bp(gq, rlabels = FALSE)8 linkage filter
Now filter for linkage
#perform linkage filtering to get a reduced vcf with only unlinked SNPs
vcfR.thin<-distance_thin(vcfR.trimmed, min.distance = 1000)
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
1554 out of 15704 input SNPs were not located within 1000 base-pairs of another SNP and were retained despite filtering
9 write vcf to disk for downstream analyses
#get info for all SNPs passing filtering vcf dataset
vcfR.trimmed***** Object of Class vcfR *****
124 samples
239 CHROMs
15,704 variants
Object size: 166.5 Mb
5.314 percent missing data
***** ***** *****
#write to disk
#vcfR::write.vcf(vcfR.trimmed, file = "~/Desktop/cali.zosterops.rad/filtered.snps.vcf.gz")
#get info for the thinned SNP dataset
vcfR.thin***** Object of Class vcfR *****
124 samples
239 CHROMs
1,554 variants
Object size: 16.3 Mb
5.522 percent missing data
***** ***** *****
#write to disk
#vcfR::write.vcf(vcfR.thin, file = "~/Desktop/cali.zosterops.rad/filtered.unlinked.snps.vcf.gz")